Reinforcement Learning through Global Stochastic Search in N-MDPs
نویسندگان
چکیده
Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirement on the knowledge representation in order to be sound: the underlying stochastic process must be Markovian. In many applications, including those involving interactions between multiple agents (e.g., humans and robots), sources of uncertainty affect rewards and transition dynamics in such a way that a Markovian representation would be computationally very expensive. An alternative formulation of the decision problem involves partially specified behaviors with choice points. While this reduces the complexity of the policy space that must be explored something that is crucial for realistic autonomous agents that must bound search time it does render the domain NonMarkovian. In this paper, we present a novel algorithm for reinforcement learning in Non-Markovian domains. Our algorithm, Stochastic Search Monte Carlo, performs a global stochastic search in policy space, shaping the distribution from which the next policy is selected by estimating an upper bound on the value of each action. We experimentally show how, in challenging domains for RL, high-level decisions in Non-Markovian processes can lead to a behavior that is at least as good as the one learned by traditional algorithms, and can be achieved with significantly fewer samples.
منابع مشابه
A Generalized Reinforcement-Learning Model: Convergence and Applicationa
Reinforcement learning is the process by which an autonomous agent uses its experience interacting with an environment to improve its behavior. The Markov decision process (mdp) model is a popular way of formalizing the reinforcement-learning problem, but it is by no means the only way. In this paper, we show how many of the important theoretical results concerning reinforcement learning in mdp...
متن کاملSolving Markov Decision Processes in Metric spaces
We present an approximation scheme for solving Markov Decision Processes (MDPs) in which the states are embedded in a metric space. Our algorithm has a time bound of Õ(n log logn ), where n is the number of states and the approximation factor. This bound is independent of the numerical size of the input and the discount factor and is hence strongly polynomial. We present the result for determin...
متن کاملA Generalized Reinforcement-Learning Model: Convergence and Applications
Reinforcement learning is the process by which an autonomous agent uses its experi ence interacting with an environment to im prove it::; beha.vior. The rVlarkov dec.ision pro cess (:\'lDP) modd is a popular way of for malizing the reinforcement-learning problem, but it it) by no meant; LIte only \-vay. In Lhitl paper, we show hmy IllaIlY of the important theoretical results concerning rein...
متن کاملState Aggregation in Monte Carlo Tree Search
Monte Carlo tree search (MCTS) algorithms are a popular approach to online decision-making in Markov decision processes (MDPs). These algorithms can, however, perform poorly in MDPs with high stochastic branching factors. In this paper, we study state aggregation as a way of reducing stochastic branching in tree search. Prior work has studied formal properties of MDP state aggregation in the co...
متن کاملApproximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers
In stochastic planning problems formulated as factored Markov decision processes (MDPs), also called dynamic belief network MDPs (DBN-MDPs) (Boutilier, Dean, & Hanks 1999), finding the best policy (or conditional plan) is NP-hard. One of the difficulties comes from the fact that the number of conditionals required to specify the policy can grow to be exponential in the size of the representatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011